Defocus operations for a virtual display with focus and defocus determined based on camera settings

ABSTRACT

Methods and systems are presented for generating a virtual scene usable in a captured scene with focus settings that take into account camera position. Virtual displayed in a virtual scene that is presented on a display wall and captured in a scene can be presented in the virtual scene with a focus or defocus that is dependent on a virtual object position in the virtual scene and a position of a camera relative to the display wall. Defocusing of virtual objects can be such that an eventual defocus when captured by the camera corresponds to what would be a defocus of an object distant from the camera by a distance that represents a first distance from the camera to the display wall and a second distance being a virtual distance in the virtual scene from the virtual object to a virtual camera plane of the virtual scene.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority benefit under 35 U.S.C. §119(e) ofU.S. Provisional Pat. Application No. 63/287,920 filed Dec. 9, 2021. Theentire disclosure of the application recited above is herebyincorporated by reference, as if set forth in full in this document, forall purposes.

FIELD

The present disclosure generally relates to methods and apparatus forcomputerized image processing and more particularly to processing acomputer-generated image related to emulating focus/defocus effects forvirtual objects displayed on a display wall.

BACKGROUND

In computer-generated image generation and animation, of imagerycomprising images and/or video sequences, there might be a desire toincorporate a video wall while capturing a live action scene of anactor. In a detailed live action scene that incorporates backgroundanimated elements, it could be difficult to properly coordinate a liveactor with those background elements. Furthermore, if it is desirable toproject lights, colors, or other real-world effects on a live actor, itmay be tedious to ensure that those effects are properly aligned betweenthe live actor and the animated background imagery that is later addedto a scene. Animated objects that may be placed in a background sceneand/or with a live action scene can comprise many individual objects,which may have their own lighting effects, colors, and/or interactionswith live actors. For example, a scene involving an explosion or otherintense light may have features that cause colors to be projected ontolive actors. Background scenes may also involve stage elements and/orcreatures that interact with live actors, such as by acting as anenvironment and/or engaging with live actors. As such, it might beuseful to deploy an image generation device to display imagery in astage environment whereby a camera capturing the stage scene capturesactions and presence of the live actor, stage props, etc., as well ascapturing what is displayed by the image generation device with what isdisplayed by the image generation device giving off light that mightimpinge on the live actor, stage props, etc.

An image generation device might be placed relative to a camera suchthat the camera captures a scene that includes the image generationdevice while the image generation device is displaying imagery orcaptures a scene that is in part illuminated by light emitted by theimage generation device. The image generation device may occupy part ofa background of a stage scene or might encompass all of the backgroundof the stage scene. The image generation device may be planar and maymight be referred to as a video wall.

Physical stage sets can be expensive and time-consuming to produce,which may be prohibitive for short scenes or scenes in a project withlimited schedule or budget. Conversely, painted backdrops or mattepaintings may not provide desired realism. It is commonplace in videoproduction to film actors and foreground objects on a green-screenstage, to which digital backgrounds, characters, and special effects canbe added later. However, actors may have difficulty reacting to objects,events, or characters they cannot see. Furthermore, the lighting effectsproduced by explosions, moving objects, and other changes in the digitalvideo may be difficult to reproduce on the physical stage, thusimpairing realism. For this reason, some video production now employs adisplay wall (also known as a video wall, LED wall, etc.) at the rear ofthe live-action stage, which is capable of displaying rendered digitalbackgrounds, characters, effects, and lighting in real time.

For simple scenes and/or backgrounds, modeling or drawing individualbackground objects and/or scenes might not be difficult. However, asviewers have come to expect more complex visuals, there is a need for aprocedural processing, rendering, and adjusting backgrounds to appearmore realistic. Further, stereoscopic imaging may be used to capturescenes as they would be viewed from different angles, and therefore adddepth and 3D elements to the captured images and video. When 2D elementsare added to a display wall or other 2D object, the display wall may notappear realistic with 3D live actors when a live scene isstereoscopically captured.

Furthermore, when a virtual object is displayed in the display wall, itis necessarily located “behind” the display wall in a virtual space, andthus has a “depth” relative to a physical camera that is equal to, on astraight line connecting the physical camera with the virtual object,the physical distance or stage environment distance between the cameraand the video wall, plus the virtual distance between the video wall andthe object. The lens characteristics of the physical camera may dictatethat an object at such a depth “should” be out of focus (i.e., would beout of focus if it were a physical object at that physical depth).Moreover, multiple objects at different depth within the scene “should”be defocused to different degrees. However, defocus of objects or sceneelements shown on the display wall is currently performed by alteringthe focal characteristics of a physical lens on the physical camera, andthus all virtual objects and virtual scene elements are defocused to thesame degree, based on the distance between the camera and the displaywall.

In some implementations, a method and apparatus for emulating a depth offield effect of a physical camera and allowing portions of a scene to bedefocused post-rendering, in real time, might be desirable.

SUMMARY

A system of one or more computers can be configured to performparticular operations or actions by virtue of having software, firmware,hardware, or a combination of them installed on the system that inoperation causes or cause the system to perform the actions. One or morecomputer programs can be configured to perform particular operations oractions by virtue of including instructions that, when executed by dataprocessing apparatus, cause the apparatus to perform the actions. Onegeneral aspect includes a computer-implemented method of generating avirtual scene rendering usable in a captured scene. The computer -implemented method includes determining a camera position of a camera,light sensor, image capture device, etc., in a stage environment;determining a display position of a virtual scene display in the stageenvironment; determining a virtual focus model, based at least on arelative positioning as between the camera position and the displayposition, where the virtual focus model is represented by a focus modeldata structure defining how focus should be applied to virtual sceneelements in the virtual scene to be presented on the virtual scenedisplay while the camera captures imagery of the stage environmentincluding the virtual scene display; determining a depth value for agiven virtual scene element, where the depth value corresponds to avirtual distance in the virtual scene between the given virtual sceneelement and a virtual camera viewpoint and/or the virtual scene display;determining an adjusted focus, in the virtual scene, of the givenvirtual scene element based on at least the depth value and the relativepositioning; and rendering the virtual scene taking into account theadjusted focus. Other implementations of this aspect includecorresponding computer systems, apparatus, and computer programsrecorded on one or more computer storage devices, each configured toperform the actions of the methods.

Implementations may include one or more of the following features. Insome implementations, the virtual focus model includes data specifyingat least one of a depth of field for the camera, a focal point for thecamera, and/or a focal length of a lens of the camera. In someimplementations, the virtual focus model includes data specifying a setof bokeh, vignette, or lemmoning effects for one or more virtual sceneelements to be applied to the virtual scene elements when rendering thevirtual scene. In some implementations, the captured scene includes anoptical view of the stage environment, where the stage environment is amovie set, and where the virtual scene display is positioned in thestage environment further from the camera than at least one live actorvisible in a camera scene captured by the camera. In someimplementations, the adjusted focus includes a defocus of the givenvirtual scene element to, at least approximately, match a presumeddefocus the given virtual scene element would have if the given virtualscene element were present at a presumed distance from the camera thatcorresponds to a function of a first distance from the camera to thevirtual scene display and the depth value of the given virtual sceneelement. In some implementations, the function of the first distance andthe depth value is a sum of the first distance and the depth value. Insome implementations, determining the camera position includes readingdata from camera position sensors placed on the camera. In someimplementations, determining the display position includes reading datafrom display position sensors placed on the virtual scene display. Insome implementations, determining the camera position and determiningthe display position include receiving manually entered position data.In some implementations, determining the virtual focus model is based,at least in part, on predetermined parameters defining lenscharacteristics of a lens used on the camera. In some implementations,the virtual scene display includes an LED wall. In some implementations,the LED wall is positioned as a background in the stage environment andspans an entirety of a scene captured by the camera. In someimplementations, the LED wall is planar, piecewise planar, or where atleast a portion of the led wall has a curved portion. In someimplementations, the virtual scene display is planar and perpendicularto a camera lens axis. Some implementations include a non-transitorycomputer-readable storage medium storing instructions, which whenexecuted by at least one processor of a computer system, causes thecomputer system to carry out the method. Some implementations include acomputer-readable medium carrying instructions, which when executed byat least one processor of a computer system, causes the computer systemto carry out the method. Some implementations include computer systemincluding: one or more processors; and a storage medium storinginstructions, which when executed by the one or more processors, causethe computer system to implement the method. Some implementationsinclude a carrier medium carrying image data that includes pixelinformation generated according to the method.

One general aspect includes a computer-implemented method of determininga virtual focus model for a camera apparatus. The method includesgenerating a first calibration image; determining a camera position ofthe camera apparatus in the stage environment; determining a displayposition of the virtual scene display in the stage environment; settingthe camera apparatus to have a first state of the optical path;capturing a first captured image of the stage environment with thecamera apparatus in the first state, where the first captured imageincludes a first view of at least a first portion of the virtual scenedisplay displaying the first calibration image; setting the cameraapparatus to have a second state of the optical path; capturing a secondcaptured image of the stage environment with the camera apparatus in thesecond state, where the second captured image includes a view of atleast a second portion of the virtual scene display displaying the firstcalibration image; comparing the first captured image and the secondcaptured image to derive a focus parameter, based at least in part onthe camera position and the display position; and providing the focusparameter as a computer-readable portion of the virtual focus model.Other implementations of this aspect include corresponding computersystems, apparatus, and computer programs recorded on one or morecomputer storage devices, each configured to perform the actions of themethods.

Implementations may include one or more of the following features. Insome implementations, the focus parameter includes one or more of adepth of focus for the camera apparatus, a focal point for the cameraapparatus, or an aberration effect of the camera apparatus. In someimplementations, the camera apparatus includes a stereoscopic opticalpath and at least two image capture elements. In some implementations,the computer-implemented method further including: adjusting a focus ofthe camera apparatus continuously through a rack focus range; andrecording image capture of the stage environment as the focus isadjusted. Examples can be found in U.S. Pat. Application No. 17/378,503to Hayes et. al, “SMOOTHLY CHANGING A FOCUS OF A CAMERA BETWEEN MULTIPLETARGET OBJECTS”, filed Jul. 16, 2021, hereby incorporated by referenceas though fully set forth herein. In some implementations, thecomputer-implemented method further including adjusting the firstcalibration image while recording the image capture of the stageenvironment as the focus of the camera apparatus is adjusted. In someimplementations, adjusting the focus of the camera apparatus or settingthe camera apparatus to have a second state of the optical path includesadjusting a focus of the camera apparatus by moving the cameraapparatus. In some implementations, adjusting the focus of the cameraapparatus or setting the camera apparatus to have a second state of theoptical path includes adjusting a motion control head of the cameraapparatus or manually adjusting a focus of the camera apparatus with apositional encoder. In some implementations, the first captured imageincludes a first view of at least a first portion of the virtual scenedisplay displaying the first calibration image and the secondcalibration image; and capturing the second captured image of the stageenvironment with the camera apparatus in the second state, where thesecond captured image includes a view of the at least the second portionof the virtual scene display displaying the first calibration image andthe second calibration image. In some implementations, the firstcalibration image includes at least one of lines, circles, polygons,and/or photographic images. In some implementations, the firstcalibration image includes shapes of varying sizes, line weights, orcolors. In some implementations, the first calibration image istwo-dimensional. In some implementations, the first calibration imageincludes two-dimensional elements at different depths or planeorientations. In some implementations, the first calibration image isthree-dimensional. In some implementations, the first calibration imageis positioned in the virtual scene at a depth of a surface of thevirtual scene display. In some implementations, the first calibrationimage is positioned in the virtual scene at a depth different than asurface of the virtual scene display. In some implementations, the firstcalibration image is animated. Some implementations include anon-transitory computer-readable storage medium storing instructions,which when executed by at least one processor of a computer system,causes the computer system to carry out the method. Some implementationsinclude a computer-readable medium carrying instructions, which whenexecuted by at least one processor of a computer system, causes thecomputer system to carry out the method. Some implementations include acomputer system including: one or more processors; and a storage mediumstoring instructions, which when executed by at least one processor,cause the computer system to implement the method. Some implementationsinclude a carrier medium carrying image data that includes pixelinformation generated according to the method.

One general aspect includes a computer-implemented method of generatinga rendering of a virtual scene. The method includes determining a cameraposition of a camera in a stage environment that is to be used tocapture the captured scene; determining a display position of a virtualscene display in the stage environment; determining a set of depthslices, where a depth slice of the set of depth slices represents asubregion of the virtual scene space; determining a blur factor for thedepth slice based at least in part on the camera position, the displayposition, and a depth value or depth range for the subregion of thevirtual scene space represented by the depth slice; determining which ofthe plurality of virtual scene elements can be assigned to which depthslices of the set of depth slices; and rendering the virtual scenetaking into account blur factors for virtual scene elements based atleast in part on depth slices of the virtual scene elements. Otherimplementations of this aspect include corresponding computer systems,apparatus, and computer programs recorded on one or more computerstorage devices, each configured to perform the actions of the methods.

Implementations may include one or more of the following features. Insome implementations, the method further includes determining a scalingbetween a first distance measurement in the stage environment and asecond distance measurement in the virtual scene. In someimplementations, each blur factor is a function of a virtual distance inthe virtual scene space and/or a distance measurement in the stageenvironment from a focal point of the camera. In some implementations,determining each blur factor includes computing a transparency forpixels associated with object edges associated with a blurred virtualscene element. In some implementations, each blur factor is applied aspart of the rendering of the virtual scene. In some implementations,each blur factor is applied to pixels of the rendered virtual scene thatare associated with a respective blurred virtual scene element. In someimplementations, the blur factor of the respective virtual scene elementis updated when a change occurs to at least one of the camera position,the depth slice of the respective virtual scene element, and/or a focusparameter of the camera. In some implementations, each depth slice ofthe set of depth slices is of the same thickness as the other depthslices of the set of depth slices. In some implementations, at least onedepth slice of the set of depth slices is of different thickness from atleast one other depth slice of the set of depth slices. In someimplementations, the depth slices of the set of depth slices arecontiguous. In some implementations, at least one depth slice of the setof depth slices is not contiguous with any other depth slice of the setof depth slices. In some implementations, the depth slices of the set ofdepth slices are planar. In some implementations, the depth slices ofthe set of depth slices spherical sections. Some implementations includea non-transitory computer-readable storage medium storing instructions,which when executed by at least one processor of a computer system,causes the computer system to carry out the method. Some implementationsinclude a computer-readable medium carrying instructions, which whenexecuted by at least one processor of a computer system, causes thecomputer system to carry out the method. Some implementations include acomputer system including: one or more processors; and a storage mediumstoring instructions, which when executed by the one or more processors,cause the computer system to implement the method. Some implementationsinclude a carrier medium carrying image data that includes pixelinformation generated according to the method.

One general aspect includes a computer-implemented method of generatinga virtual scene rendering usable in a captured scene. The methodincludes determining a camera position of a camera in a stageenvironment; determining a mapping of a plurality of subregions of avirtual scene display in the stage environment to correspondingpositions in the stage environment; for a given virtual scene element:a) determining a corresponding subregion of the plurality of subregionsfor the given virtual scene element, where the corresponding subregioncorresponds to where on the virtual scene display the given virtualscene element would, at least in part, appear; b) determining a stagesubregion depth value for the corresponding subregion, where the stagesubregion depth value represents a distance from the correspondingsubregion to at least one of the camera position and a focal point ofthe camera based, at least in part, on the mapping of the plurality ofsubregions to the corresponding positions in the stage environment; c)determining a virtual subregion depth value based, at least in part, ona depth value for the given virtual scene element; and d) determining ablur factor for the corresponding subregion based at least in part onthe stage subregion depth value and the virtual subregion depth value;and rendering the virtual scene taking into account the blur factor forthe given virtual scene element.

Implementations may include one or more of the following features. Insome implementations, the virtual scene display varies from a planeperpendicular to a camera optical axis. In some implementations, thecorresponding subregion includes an N × N pixel array. In someimplementations, the corresponding subregion includes a single pixel. Insome implementations, each subregion of the plurality of subregions isof a same size or number of pixels as the other subregions in theplurality of subregions. In some implementations, at least one subregionof the plurality of subregions is of a different size or number ofpixels than at least one other subregion of the plurality of subregions.In some implementations, the subregions of the plurality of subregionsare contiguous. In some implementations, at least subregion of theplurality of subregions is not contiguous with any other subregion ofthe plurality of subregions. In some implementations, determining eachblur factor includes computing a transparency for pixels associated withobject edges associated with a virtual scene element within a blurredsubregion. In some implementations, each blur factor is applied as partof the rendering of the virtual scene. In some implementations, eachblur factor is applied to pixels of the rendered virtual scene that areassociated with a respective blurred subregion. In some implementations,the blur factor of the respective blurred subregion is updated when achange occurs to at least one of the camera position, the respectivesubregion of the respective virtual scene element, and/or a focusparameter of the camera. Some implementations include a non-transitorycomputer-readable storage medium storing instructions, which whenexecuted by at least one processor of a computer system, causes thecomputer system to carry out the method. Some implementations include acomputer-readable medium carrying instructions, which when executed byat least one processor of a computer system, causes the computer systemto carry out the method. Some implementations include a computer systemincluding: one or more processors; and a storage medium storinginstructions, which when executed by the one or more processors, causethe computer system to implement the method. Some implementationsinclude a carrier medium carrying image data that includes pixelinformation generated according to the method.

One general aspect includes a computer-implemented method for generatinga virtual scene rendering of a captured scene. The computer-implementedmethod includes determining a camera position of a camera in a stageenvironment; determining a display position of a virtual scene displayin the stage environment; determining focus parameters of the camera;determining a depth value for a given virtual scene element viewable asan image on the virtual scene display, where the depth value correspondsto a virtual distance in the virtual scene between the given virtualscene element and a virtual camera viewpoint located at the cameraposition; determining a desired focus model, based at least on: thefocus parameters of the camera; the depth value; and a desired lenseffect; determining an adjusted focus for the given virtual sceneelement based on the desired focus model; and applying at least aportion of the adjusted focus to at least one of: the focus parametersof the camera; the image of the given virtual scene element on thevirtual scene display; or pixels representing the given virtual sceneelement in a composite image, captured by the camera, of the stageenvironment and the virtual scene display. Other implementations of thisaspect include corresponding computer systems, apparatus, and computerprograms recorded on one or more computer storage devices, eachconfigured to perform the actions of the methods.

Implementations may include one or more of the following features. Insome implementations, applying the at least a portion of the adjustedfocus includes applying the entire adjusted focus to the focusparameters of the camera. In some implementations, applying the at leasta portion of the adjusted focus includes applying the entire adjustedfocus to the image of the virtual scene element on the virtual scenedisplay. In some implementations, Applying the at least a portion of theadjusted focus includes applying the entire adjusted focus to the pixelsrepresenting the given virtual scene element in the composite image. Insome implementations, the focus parameters of the camera include atleast one of a depth of field for the camera, a focal point for thecamera, a focal length of a lens of the camera, or predeterminedparameters defining lens characteristics of a lens used on the camera.In some implementations, the desired lens effect includes at least oneof a bokeh, vignette, lemmoning, lens flare, diopter, aberration,fisheye, filter, mask, slit, or grating effect. In some implementations,the composite image includes an optical view of the stage environment,where the stage environment is a movie set, and where the virtual scenedisplay is positioned in the stage environment further from the camerathan at least one live actor visible in a camera scene captured by thecamera. In some implementations, the virtual scene display includes anLED wall. In some implementations, the virtual scene display ispositioned as a background in the stage environment and spans anentirety of a scene captured by the camera. In some implementations, thegiven virtual scene element is within a viewing frustum of the camera.In some implementations, the given virtual scene element is outside aviewing frustum of the camera, and the stage environment is lit at leastin part by light emitted by the given virtual scene element. In someimplementations, the image of the given virtual scene element on thevirtual scene display is a low-resolution, low-frame-rate, orlow-dynamic-range image. In some implementations, the pixelsrepresenting the given virtual scene element in the composite image arereplaced with second pixels of a high-resolution, high-frame-rate, orhigh-dynamic range second image. Some implementations include anon-transitory computer-readable storage medium storing instructions,which when executed by at least one processor of a computer system,causes the computer system to carry out the method. Some implementationsinclude a computer-readable medium carrying instructions, which whenexecuted by at least one processor of a computer system, causes thecomputer system to carry out the method. Some implementations include acomputer system including: one or more processors; and a storage mediumstoring instructions, which when executed by the one or more processors,cause the computer system to implement the method. Some implementationsinclude a carrier medium carrying image data that includes pixelinformation generated according to the method. Implementations of thedescribed techniques may include hardware, a method or process, orcomputer software on a computer-accessible medium.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tolimit the scope of the claimed subject matter. A more extensivepresentation of features, details, utilities, and advantages of themethods, as defined in the claims, is provided in the following writtendescription of various implementations of the disclosure and illustratedin the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Various implementations in accordance with the present disclosure willbe described with reference to the drawings, in which:

FIG. 1 illustrates an example real-world stage, in accordance with atleast one implementation of the present disclosure.

FIG. 2 illustrates an example real-world stage, which includes a displaywall, behind which exists a virtual space comprising a virtual scene, inaccordance with at least one implementation of the present disclosure.

FIG. 3 illustrates an example real-world stage, which includes a displaywall, behind which exists a virtual space comprising a virtual scene, inaccordance with at least one implementation of the present disclosure.

FIG. 4 illustrates an example real-world stage, which includes a displaywall, behind which exists a virtual space comprising a virtual scene, inaccordance with at least one implementation of the present disclosure.

FIG. 5 illustrates an example real-world stage, which includes a displaywall, behind which exists a virtual space comprising a virtual scene, inaccordance with at least one implementation of the present disclosure.

FIG. 6 illustrates an example real-world stage, which includes a displaywall, behind which exists a virtual space comprising a virtual scene, inaccordance with at least one implementation of the present disclosure.

FIG. 7 illustrates an example video wall divided into a plurality ofsubregions, in accordance with at least one implementation of thepresent disclosure.

FIG. 8 shows a virtual production set, in accordance with at least oneimplementation of the present disclosure.

FIG. 9 is a flowchart of an exemplary method as might be performed by animage processor to defocus a sharp rendered image, in accordance with atleast one implementation of the present disclosure.

FIG. 10 is a flowchart of an exemplary method as might be performed byan image processor to defocus a sharp rendered image, in accordance withat least one implementation of the present disclosure.

FIG. 11 illustrates an example of defocusing a virtual object, inaccordance with at least one implementation of the present disclosure.

FIG. 12 illustrates an example visual content generation system as mightbe used to generate imagery in the form of still images and/or videosequences of images, in accordance with at least one implementation ofthe present disclosure.

FIG. 13 is a block diagram that illustrates a computer system upon whichthe computer systems of the systems described herein and/or visualcontent generation system may be implemented, in accordance with atleast one implementation of the present disclosure.

DETAILED DESCRIPTION

In the following description, various implementations will be described.For purposes of explanation, specific configurations and details are setforth in order to provide a thorough understanding of theimplementations. However, it will also be apparent to one skilled in theart that the implementations may be practiced without the specificdetails. Furthermore, well-known features may be omitted or simplifiedin order not to obscure the implementation being described.

A virtual scene might be represented by data corresponding to pixels inimage space. The deep image might be generated from virtual objectsdescribed in a scene space and then by rendering, or otherwise, isrepresented in an image dataset that might specify, for example, foreach pixel in a pixel array, a pixel image value array. Each entry inthe pixel image value array might comprise a pixel color value, anoptional alpha value, a depth value or a depth range, and an objectidentifier identifying which object contributes that color/alpha at thespecified depth. The pixel image value array might be associated with aparticular pixel by an explicit reference to an associated pixelposition or the particular pixel might be determinable by a position ofthe pixel image value array within the image dataset.

Computer simulation that is used with live actors may be placed into alive action scene in different ways. Conventionally, live actors may actin front of a green screen or other colored background that allows forchroma keying to provide visual effects in post-production and after alive action scene is captured. This may be used to provide differentbackground effect, which allows for providing computer simulatedeffects, backgrounds, and the like with live actors. However, chroma keycompositing may suffer from issues of realism when adding visual effectsin post-production. For example, the green screen does not include anycomputer simulation and/or animation when the live actor is acting inthe live action scene. Thus, the live actor may be required to pretendthat certain animated portions of the scene are present. This may be anissue where those animated elements are interacted with by the liveactor, such as an environmental object and/or character. A live actormay not know exactly where a cliff or edge may be when later added asanimated elements or may not know the exact location of a character orcreature that the live actor is engaging with in the live action scene.Further, the green screen used for chroma keying does not providelighting or colors, which may be projected onto the live actor (e.g., inthe case of an explosion or the like) or may be reflected in the liveactors’ eyes, glasses, wardrobe pieces, or the like.

In this regard, a display wall may be used, which may include a displayscreen (e.g., an LED, LCD, LED LCD, OLED, or the like) that is capableof outputting a rendering of an image or a video. The rendering maycorrespond to a precursor image, which may be an image from a rendereror a compositor. This image may be entirely or partially computergenerated and/or animated, may be captured earlier from a live actionscene, or a combination thereof. The precursor image may be a singleimage displayed on the display wall or may be a sequence of images, suchas frames of a video or animation. The precursor image may includeprecursor metadata for computer generated imagery and/or pixel displaydata for pixels of the display wall. In this regard, the precursormetadata may include output pixels, data, color, intensity, and the likefor outputting the image on the display wall.

The display wall may correspond to one or more structures that are thenpositioned in the real-world live action scene, and may be planar,curved, or the like. In this regard, the display wall may serve as abackground, however, this may not be the only orientation and thedisplay wall may also be placed above, next to, or otherwise orientedwith regard to a live actor. The display wall may be placed relative toa scene or stage element, such as a live actor or other object in the 3Dreal-world scene for the live action. The live actor then acts andinteracts with the live action scene corresponding to the real-worldenvironment that is being captured by one or more cameras. The liveaction scene may also be captured by other sensors and/or sensingdevices, including optical sensing devices, depth sensors and/or rangingsensors (e.g., LiDAR), and the like.

When the live actor interacts with and/or performs in the live actionscene, the display wall may output one or more precursor images that maybe used with the live actor to capture image data. This image data maybe captured by one or more or more cameras. For example, a camera may beoriented relative to the 2D display wall to capture background imageryand pixels of the 2D display wall while stage elements are present in alive action scene in front of and/or relative to the 2D display wall. Insome implementations, the image data may be stereoscopically captured bytwo or more cameras may be placed proximate to each other in differentlocations so that the cameras may capture different angles of the liveaction scene. In some implementations, the cameras may be oriented so asto mimic how a person or creature may view the live action scene, suchas how a human that would view the live action scene. This may be usedto provide additional realism to capturing the live action scene and/orcapture the live action scene in 3D so that the image data may later berendered and output in 3D when viewed by an audience.

The live actor may be engaging with the display wall and/or otherelements and objects in the live action scene while computer generatedimagery is being generated by a renderer and displayed on the displaywall. The display wall may also emit or output light, colors, and thelike that are projected on and/or reflected by the live actor or otherobjects in the live action scene. An image processor may, in real-time,near real-time, or at a later post-processing time, determine whichportions of the captured image data correspond to the live actor, andwhich portions correspond to the display wall (e.g., background ordisplay wall pixels). The portions may correspond to individual pixelsof the display wall and/or live actor in the image data and may bedifferent between different image data captured by each camera. Theimage processor may then generate an image mat, which may be generatedand/or stored as a pixel array where values or data for pixels in thepixel array indicate whether the corresponding pixel is part of thedisplay wall or another foreground actor, object, or the like in thelive action scene.

The image matte may then be used to selectively modify pixels in theprecursor metadata for the displayed precursor image to change theoutput or rendering of those pixels when displayed on the display walland/or captured in the camera image data. In some implementations, theimage matte may be a binary mat, a junk matte (e.g., where pixels areidentified as part of the display wall, not part of the display wall,and/or uncertain as to whether those pixels belong to the display wall).For example, the image matte may be used to modify pixels thatcorrespond to the display wall when a precursor image is being displayedon the display wall or at a later time when the stereoscopic image datais processed and background pixels are moved, warped, adjusted, and/orre-rendered. Thus, the image matte may correspond to an alpha channelimage that may allow for modification, adjustment, and/or change of oneor more portions of the stereoscopic image data (e.g., the live actor,the display wall, or another portion of the image data).

In this regard, the precursor metadata may include a scene descriptionwith information about the 3D position of a computer-generated objects,characters, or the like in the precursor image for the display wall.Modification of pixels corresponding to the display wall may includereplacement of pixel color values and/or pixel outputs that correspondto the display wall to the pixel values that would have actually beencaptured (including focus levels varying with depth) if the precursorimage on the display wall actually existed in the live action scene andreal-world 3D environment. This may allow cameras to capture images ofthe live actor in front of the display wall, but with focal propertiesof virtual objects modified according to depth, so that the capturedimage data appears as if the precursor image was actually present in 3Dinstead of 2D on the display wall. Thus, a renderer and/or compositorsystem may perform pixel replacement corresponding to the display wall,taking into account precursor metadata of the precursor image on thedisplay wall, to present a processed precursor image on the display wallfor an image in a live action scene.

The image processor may generate the image matte using differenttechniques. For example, the image processor may determine the imagematte by computing the image matte, from the precursor image metadataand/or display wall metadata (e.g., information for placement of thedisplay wall, depths to 3D stage elements and/or cameras, lighting oreffects in the real-world environment, etc.) with the image data basedon what the camera “should” see based on its lens characteristics andsettings. Further, the image processor may determine where the displaywall is in a scene, such as a distance or depth from the cameras to thedisplay wall, to identify those pixels that correspond to the displaywall and those that correspond to a foreground object or live actor.

Once the image processor determines the image mat, the image processormay execute different processes with the image mat. For example, theimage processor may “splat” or otherwise blur certain pixels, as well aspotentially adjusting, moving, and/or warping pixels of the precursorimage from a camera position and/or camera settings with the precursormetadata and/or display wall metadata. This may be based on the positionof the actual camera and display wall at the time of capture and theadjustments or replacement pixels may be generated using the image matteto appear as a 3D image or imagery on the display wall. Depth may alsobe determined using one or more depth sensors for pixel identificationand replacement. Further, this may account for different types ofcameras capturing the scene, such as a “hero” camera for a main cameracapturing the important or main elements of the live action scene (e.g.,the live actor), as well as non-hero cameras.

Although the pixels are referred to as replacement pixels, the pixelsmay instead be data and/or metadata that cause changing, warping,adjusting, or moving original background pixels associated with thedisplay wall instead of directly replacing such pixels. Adjusting pixelsmay include warping or changing pixels, which may change pixel values ordata (e.g., color, brightness, transparency, luminosity, effect,intensity, and/or the like). Thus, pixel adjustments may include changesto pixel values and data in image data instead of or in addition todirect replacement of pixels with another or different pixel.

A scene may be defocused by applying a lens blur filter with differentparameters to give the resulting image a depth of field effect similarto what may be achieved in a photograph taken with a physical camera.Existing methods of doing so can be computationally expensive andachieve limited results. For example, a scene may be divided intovertical layers based on the distance of pixels from a virtual camera,and a blurring effect may be applied to each layer. The blurring effectapplied, however, may be uniform throughout a layer (e.g., every pixelin the layer is blurred by the same amount), resulting in images thatlack detail, particularly when a narrow depth of field is involved.

In a specific example, a scene description might describe tree objectsin a forest of trees. An image dataset might be provided to an animatorthat is a deep image of that scene, rendered into a deep image. It maybe desirable to defocus objects within the scene to draw attention todifferent objects in the scene in a way that emulates the depth of fieldeffect of a physical camera. For example, trees in the background may bedefocused while focus is placed on a running character. Achieving adesired result for the look of a scene may require experimentation withdifferent parameters such as the amount of blur to be applied, a lensshape, and a lens effect. Modifications to any of the parameters mayrequire rendering a scene again to view the result of the modifications.Therefore, it would be useful to be able to defocus objects or sectionsof a scene post-rendering, in real time or near-real time on the displaywall, without requiring re-rendering the scene.

The present disclosure aids substantially in generating visual contentsuch as movies and videos, by improving the realism of virtual sets,reducing the amount of computation required to render themrealistically, and reducing or eliminating the need to combine“green-screen” live action footage with computer-generated content.Implemented on a display in communication with a processor, thedefocusing systems, method, and devices disclosed herein providepractical improvement in the rendering of visual images on displays suchas video wall backdrops. This improved dynamic backdrop capability orreal-time matte capability transforms a multi-step production processinto a simple stage production, without the normally routine need tocombine multiple independent video streams into a single moving image.This unconventional approach improves the functioning of a moviesoundstage, by permitting actors and props to be photographed in frontof moving, realistically focused background scenes that emit light andcreate shadows in realistic ways.

FIG. 1 illustrates an example real-world stage 100, in accordance withat least one implementation of the present disclosure. The real-worldstage comprises a real-world scene 105, which includes a display wall102 (also known as a video wall, LED wall, digital wall, virtual scenedisplay, virtual backdrop, dynamic backdrop, or virtual productiondisplay), behind which exists a virtual space 150 that may for examplebe populated with digital scenes, objects, characters, environments,etc., which collectively represent a virtual scene 106 that may bedisplayed as one or more 2D images on video wall 102, thus creating acombined physical and virtual stage 107 (also referred to as a combinedscene, combined real-world scene, or virtual production set). Whilevarious examples herein may refer to a display device as a video wall,it should be clear that the display device could extend the entirety ofthe background of a real-world stage scene, extend partially so, and/orcould be planar, curved, piecewise linear, or in some otherconfiguration.

Display wall 102 exists at a particular position or set of positions onthe real-world stage. Display wall 102 may correspond to a liquidcrystal display (LCD), a light-emitting diode (LED) display, plasmadisplay, a combination thereof, or the like (including LCD LED,thin-film transistor (TFT) LCD, OLED, etc.), projection display, etc. Inthis regard, display wall 102 may comprise pixels 104, which may be usedto emit light of certain colors, intensities, and other parameters whenoutputting and displaying one or more images. Pixels 104 may be pictureelements or image elements that may correspond to the smallest point orcontrollable element of display wall 102 that may make up imagerydisplayed on display wall 102. This may be used to display a virtualscene 106 from a renderer, which may correspond to computer-generatedimagery, early captured or recorded live action scenes and/or objects,or a combination thereof. In order to display scene 106 on display wall102 using pixels 104, a renderer and/or compositor may be used for imageprocessing and output. Scene 106 may correspond to a precursor imagethat is displayed on display wall 102. A precursor image may be one ormultiple individual images, which may be displayed in sequence, such asframes of an animation or video. The precursor image for scene 106 maytherefore be provided by a renderer and/or may be processed using acompositor. The precursor image may also be associated with precursormetadata that may include computer-generated imagery metadata, such asthe scene description for scene 106 that is used in rendering andoutputting scene 106 on display wall 102 using pixels 104.

A renderer may correspond to hardware and/or software that generates oneor more images, or data usable for representing those images, based on ascene description, which may be single images or frame of an animationor video. A compositor may correspond to hardware and/or software thatmay combine image data of a captured live action scene and a renderedscene to form composited imagery. The captured live action scene may becaptured as image data, such as that of a live actor performed duringrecording, and live action metadata of the capturing of the scene, suchas camera settings, camera positions, lighting conditions, visualeffects, etc.

Display wall 102 may correspond to one or more structures that may bepositioned in a live action scene and be capable of displaying imageryfrom a renderer, compositor, or the like. In some instances, displaywall 102 may be a single, planar display and therefore provide a 2Doutput of virtual scene 106. In other instances, display wall 102 maycorrespond to one of a plurality of planar or curved elements and/ordisplay panels and/or may include multiple display panels in varyingpositions or orientations to generate a single wall for precursorimagery. Display wall 102 may correspond to an LED wall or otherstructure capable of displaying imagery. Display wall 102 may correspondto a background of a live action scene, but it is not necessary displaywall 102 is a background and instead may be orientated and/or otherwiseplaced in a live action scene. Display wall metadata for display wall102 may correspond to data that represents details of display wall 102,such as its construction, its orientation, resolution, size, etc., andits position in the live action scene. In some implementations, this maybe measured or determined by one or more depth, distance, and/or rangingsensors, which may be used in combination with optical cameras and/orsensors for distance finding between real-world 3D objects and/ordisplay wall 102 and one or more cameras. In some instances, displaywall 102 may partially or completely encircle stage 100. In someinstances, display wall 102 may be or may include a ceiling or floorportion. In some instances, display wall 102 may be a dome or otherthree-dimensional shape.

In the example shown in FIG. 1 , stage 100 includes a first camera,first camera position, or first real-world camera 120 a having a viewingfrustum 130 a that extends toward video wall 102, and a correspondingvirtual viewing frustum 140 a that extends “behind” video wall 102 andinto virtual space 150. The size and shape of viewing frustum 130 a andvirtual viewing frustum 140 a may depend for example on lenses, shades,apertures, etc. of first camera 120 a. A second camera, second cameraposition, or second real-world camera 120 b has a viewing frustum 130 band corresponding virtual viewing frustum 140 b. In some cases, forrealism and to minimize the need for post-production editing, theviewing frustum of a camera may be selected such that the scene viewableby the camera falls within the boundaries of video wall 102, although insome cases the viewing frustum may encompass at least a portion of afloor, ceiling, or wall of stage 100, where the floor, ceiling, or wallincludes practical set elements whose presence in the viewing frustum isdesired.

Located on stage 100 is a first physical object or scene element 160(e.g., an actor, stage prop, etc.), which is visible to both firstcamera 120 a and second camera 120 b. A physical distance, range,relative position, or depth D1 exists between first camera 120 a andfirst physical object 160. This physical distance or stage environmentdistance D1 may for example be a distance measurement generated by oneor more sensors. Depending on selected lens 125 and focus settings 126of first camera 120 a, object 160 may be in focus, or may be defocusedto a certain degree that depends on the range D1. Focus settings 126 mayalso be referred to as focal settings, focal parameters, lens settings,lens parameters, etc. For example, an operator may wish to de-emphasizeforeground or background objects such that a viewer’s attention is drawnto characters or objects at a middle distance. In some instances, eitheror both of the range D1 or focus settings 126 (including zoom level,depth of field, a focal point, a focal length, aberration, and/or bokeh,vignette, or lemmoning effects) may change dynamically while a scene isbeing filmed. For example, a camera operator or camera operationalgorithm may adjust the focus of the camera continuously through a rackfocus range, recording image capture of the stage environment as thefocus is adjusted. In some implementations, the system may also includea feedback loop between a blur factor of the virtual object 180 and thefocus settings 126 of the camera 120, to make sure that changes in thereal blur or defocus are matched by changes in the virtual blur ordefocus.

Similarly, a physical distance D2 exists between second camera 120 b andfirst physical object 160, which may be in or out of focus in view ofcamera 120 b. A second physical object or scene element 170 is alsopositioned on stage 100, at a range D3 from second camera 120 b. secondphysical object 170 is positioned within viewing frustum 130 b of secondcamera 120 b, and thus second physical object 170 is visible to secondcamera 120 b. However, second physical object 170 is positioned outsideof viewing frustum 130 a of first camera 120 a, and so second physicalobject 170 is not visible to first camera 120 a, unless first camera 120a reorients, or second physical object 170 moves, such that secondphysical object 170 falls within viewing frustum 130 a of first camera120 a.

Positioned within virtual space 150, “behind” video wall 102, is a firstvirtual object or virtual scene element 180, with a given position,size, shape, and orientation within virtual space 150. first virtualobject 180 may be stored in a processor memory and/or may be rendered aspixels 104 on video wall 102. viewing frustum 130 a of first camera 120a extends into virtual space 150 as a virtual viewing frustum 140 a.Similarly, viewing frustum 130 b of second camera 120 b extends intovirtual space 150 as a virtual viewing frustum 140 b. Because firstvirtual object 180 falls within both virtual viewing frustum 140 a andvirtual viewing frustum 140 b, first virtual object 180 is visible toboth first camera 120 a and second camera 120 b. In some instances,switching to a different camera view may require updating images on thedisplay wall 102 to account for a different virtual camera position andorientation.

A range, distance, relative position, or depth D4 between the firstcamera or first camera position 120 a and first virtual object 180exists, e.g., the distance, on a straight line connecting first camera120 a with first virtual object 180. The range, distance, or depth D4includes a real component D4_(real), representing the distance betweenfirst camera 120 a and display wall 102, and a virtual componentD4_(virtual), representing the distance, in virtual space 150, betweenvideo wall 102 and first virtual object 180. The video wall or displaywall 102 may be a physical object (or arrangement of objects) on thestage, for which there may be no corresponding virtual object in virtualspace 150. In some implementations, the first camera or first cameraposition 120 a may correspond to a virtual camera 120 a _(virtual) inthe virtual space, although this may not always be the case. Althoughfirst virtual object 180 is not “real”, it can be treated mathematicallyas though it were physically present on stage 100, at a range, depth, ordistance of D4 from first camera 120 a. Henceforth in this document, itshould be understood that any distance, depth, or range expressedbetween a physical camera and a virtual object may be a sum of a realdistance, depth, or range and a virtual distance, depth, or range, butcan be treated mathematically as a real distance between the camera anda real object. For example, a range, depth, or distance D5 separatesfirst virtual object 180 from second camera 120 b.

Virtual space 150 also contains a second virtual object or virtual sceneelement 190, which falls outside virtual viewing frustum 140 a and isthus not visible to first camera 120 a. However, second virtual object190 falls within virtual viewing frustum 140 b and is thus visible tosecond camera 120 b. A range, distance, or depth D6 exists betweensecond camera 120 b and second virtual object 190. Real-world scene 105and virtual scene 106 collectively form a combined scene or combinedreal-world scene 107.

In some aspects, lens effects may be simulated partially or entirelyusing the display wall 102. For example, real world camera 120 a mayincorporate a lens setup with wide depth of field and a focus out to alarge distance (e.g., infinity), whereas defocus for virtual objectsdisplayed on the video wall 102 may be based on their depth. In otheraspects, a virtual focus model of virtual camera 120 a _(virtual) may beemployed to introduce lens effects such as lens flare, diopters (e.g.,to defocus edges of a scene differently than central regions),aberrations, fisheye, filters, masks, slits, gratings, etc. to theimages on the display wall 102. This may advantageously permit lenseffects to be changed rapidly, in real time or near-real time, withoutthe need to change the lens 125 or lens settings 126 of camera 120 a. Inother aspects, both the camera 120 a and the display wall 102 may showobjects in sharp focus, while blur or defocus effects may be added to orremoved from real or virtual objects in a post-processing step. Thus, ina final rendered image, the focus level or blur level of any real objector virtual object visible in the image may be a combined function ofcamera focus, display wall defocus or blur effects, and post-processingdefocus or blur effects. Such a setup may advantageously permit lenseffects to be added, removed, or modified at various stages ofproduction, including on the display wall 102 in real time, without theneed to change cameras or lenses. In scenes that are predominantlyvirtual, it may be advantageous to introduce lens effects exclusively onthe display wall 102, or exclusively in the post-processing step.

In some aspects, images shown on the display wall 102 may not be withinthe viewing cone of a camera at all. This may for example occur wherethe display wall 102 is employed for dynamic lighting effects. Forexample, if a virtual object “behind” the display wall 102 is a fire,explosion, or other radiant phenomenon, then realistically lighting theface of a real-world actor or object, proximate to virtual object, mayrequire realistically defocusing an image of the virtual object on thedisplay wall 102, even if the virtual object (or indeed, the entiredisplay wall 102) is not in frame.

In some aspects, low-resolution, low-frame-rate, or low-dynamic-rangeimages may be shown on the video wall 102, in order to provide lightingeffects to objects or actors on the real-world stage and/or to provideactors with rough images to guide their movements, speech, and actions.In such cases, some or all portions of the video wall imagery may bereplaced during post-processing with higher-resolution,higher-frame-rate, or higher-dynamic-range images. Virtual objects mayalso be added, deleted, moved, or altered during this post-processingstep. In other aspects, a high frame rate for the display wall 102 mayallow multiple images to be displayed in an alternating fashion, e.g.,to provide stereo imagery.

FIG. 2 illustrates an example real-world stage 100, which includes adisplay wall or video wall 102, behind which exists a virtual space 150comprising a virtual scene 106, in accordance with at least oneimplementation of the present disclosure. A real-world stage 100 is anenvironment in which a camera (e.g., camera 120 a or 120 b) can capturelight, and possibly other data, from what is in front of the camera. Thecamera can be an optical device that captures light, visible and/or notvisible, from elements present in the real-world stage 100. Examples ofreal-world stages include sound stages, outdoor scenery, and otherplaces where image capture might take place using a camera or opticaldevice positioned before the real-world stage.

A real-world scene (e.g., real-world scene 105 of FIG. 1 ) is a scene ofreal-world scene elements (such as, for example, physical objects 160and 170 of FIG. 1 ), images of which can be captured by a camera.Real-world scene elements might comprise actors, characters, beings,objects, lighting, etc. that are visible within a viewing frustum (e.g.,viewing frustum 130 a or 130 b of FIG. 1 ) of a real-world camera, orcould affect what is captured by the real-world camera.

A virtual scene (e.g., virtual scene 106 of FIG. 1 ) is a scenedescribed by computer-readable data structures that may include virtualscene elements (e.g., virtual objects 180 and 190 of FIG. 1 ), lightinginformation, one or more virtual camera viewpoints, and one or morevirtual cameras view frame. In the case of stereoscopic virtual scenes,there might be at least two virtual camera viewpoints. A virtual sceneneed not be physically realizable, but that could be generatedconsistent with real-world physical or geometrical constraints, or otherconstraints.

Virtual scene elements might comprise actors, characters, beings,objects, lighting, etc. that are to be depicted in a virtual scene orthat can affect an appearance of a virtual scene. A virtual sceneelement might have metadata or parameters associated therewith that canbe used in rendering the virtual scene, such as position and size of thevirtual scene element in a virtual scene space coordinate system. Somevirtual scene elements might be imagery captured previously from areal-world stage.

Rendering can be a process of generating an image, or a sequence ofimages generated according to a virtual scene. The sequence of imagesmay form a timed sequence of images such as a video sequence. Renderedimages can be stored in computer-readable memory.

A virtual focus model 226 can be represented by a data structuredefining how focus should be applied to virtual scene elements, whichcan be a function of their positions in the virtual scene spacecoordinate system relative to the virtual camera viewpoint, as well asfocus parameters of a lens of a physical or virtual camera (e.g., focusparameters 126 of lens 125, as shown for example in FIG. 1 ). A virtualfocus model 226 might specify a depth of field, a focal point, a focallength, and/or bokeh effects for one or more virtual scene element thatare to be applied to such virtual scene elements when rendering thevirtual scene.

A virtual scene display might be a display device that can receiverendered images and display those rendered images. The virtual scenedisplay could be an LED wall 102, and could be planar, curved, piecewiseplanar, or some other shape.

A combined real-world scene (e.g., combined real world scene 107 of FIG.1 ) might be a real-world scene wherein at least one of the real-worldscene elements is a virtual scene display displaying a rendered image ofa virtual scene. In the example shown in FIG. 2 , video wall 102displays a first calibration image 210, comprising pixels generated onthe video wall 102. The first calibration image falls within viewingfrustum 130 a of first real-world camera 120 a and viewing frustum 130 bof second real-world camera 120 b, and so is visible to both cameras. Asecond calibration image 220 is also displayed on video wall 102, but isvisible only to second real-world camera 120 b. A calibration image 210or 220 may for example include a test pattern, which may comprise lines,circles, polygons, or other shapes of varying size, line weights,colors, etc., or may comprise photographic images or other images. Thecalibration image may be stored in a memory and retrieved for imagegeneration and display, or may be calculated at the time of display,based on real or desired focal parameters, or other considerations. Acalibration image 210 or 220 may be used for example to adjust focussettings 126 of a lens 125 of a real-world camera 120 a so that, forexample, calibration image 210 or 220 appears in good focus when thefocal depth of camera 120 a or 120 b is set to a depth similar to thedistance between camera 120 a or 120 b and video wall 102 at theposition of calibration image 210 or 220.

In an example, a camera (e.g., camera 120 a or 120 b) may capture afirst image of the combined real world and virtual scene that includes acalibration image (e.g., calibration image 210 or 220), and may thenadjust its focus or position before capturing a second image, alsoincluding the calibration image. A comparison between the first andsecond image may then be used to determine a direction and/or amount toadjust a position or focus property of the camera in order to achieve adesired level of focus. The desired level of focus may be a desiredfocus of the camera, a desired defocus or blur effect for thecalibration image or a virtual object, or combinations thereof.

FIG. 3 illustrates an example real-world stage 100, which includes avideo wall 102, behind which exists a virtual space 150 comprising avirtual scene 106, in accordance with at least one implementation of thepresent disclosure. Visible are first physical object 160, secondphysical object 170, first virtual object 180, second virtual object190, first camera 120 a, second camera 120 b, first viewing frustum 130a, second viewing frustum 130 b, first virtual viewing frustum 140 a,and second virtual viewing frustum 140 b.

Also visible are first calibration image 210 and second calibrationimage 220. Unlike FIG. 2 , where calibration images 210 and 220 aretreated as 2D images on video wall 102, in the example of FIG. 3calibration images 210 and 220 have been given positions andorientations within virtual space 150. This may be done for example tohelp an operator or algorithm adjust lens parameters 126 of a real-worldcamera and/or virtual focus model 226 of a virtual camera (e.g., avirtual camera placed at the same position and orientation as thecorresponding real-world camera) to a desired focus level for differentobjects at different depths.

In this example, lens parameters 126 of first camera 120 a are such thatfirst camera 120 has a focal point 310 a or focal plane 320 b positionedin front of physical object 160. This may be done for example so thatphysical object 160 is in good focus. However, in this situation, it maynot be realistic for virtual object 180 to also be in good focus. Forexample, if lens parameters 126 indicate that a physical object at thesame depth as virtual object 180 (e.g., depth D4 as shown in FIG. 1 )would be out of focus to a particular degree, then it may be desirableto render virtual object 180 out of focus to that same degree. However,because virtual object 180 actually exists as pixels on the surface ofvideo wall 102, the real-world focus level of virtual object 180 will infact be based on the depth between the first camera 120 a and the videowall (e.g., depth D4_(real)). Therefore, to defocus virtual object 180to a realistic degree, it is desirable to create a virtual camera at thesame position as camera 120 a, with a virtual focus model 226 based onlens 125 and lens settings 126 of camera 120 a. The renderer can thenrender virtual object 180 out of focus, to a degree proportional to itsdepth. In some implementations, the renderer may render virtual object180 in focus, but its focus level or blur factor may be adjusted basedon depth in real time or near-real time by a blurring algorithm, asdescribed for example in U.S. Pat. Application No. 17/086,032, filedOct. 30, 2020, hereby incorporated in its entirety as though fully setforth herein. It should be noted that defocus may include bokeh,vignette, or lemmoning effects.

This real-time or near-real-time defocusing or defocus adjustment may beparticularly useful in instances where either the position, orientation,or focus parameters of camera 120 a may change over time as the scene isfilmed. In some implementations, the system may include a feedback loopor the like (e.g., monitoring an encoder on the camera lens), to ensurethat gradual changes in the real blur and the virtual blur are at leastapproximately synchronized, so there is no “pop” or sudden focus changeduring a transition. Virtual focus model 226 may then for example be“slaved” to real focus parameters 126, such that the virtual camera“sees” virtual objects exactly as a real camera would see real objectsat an equivalent depth, even while the position or focal parameters ofthe camera are changing. In the example shown in FIG. 3 , calibrationimage 210 has been placed in the virtual space 150, proximate to virtualobject 180, to aid in the defocusing procedure. This may be done forexample if calibration image 210 is more recognizable than virtualobject 180, or if calibration image 210 includes particular featuresindicative of a focus level of the calibration image 210.

The virtual space 150 also includes a second calibration image 220,whose position and orientation are selected to aid in the calibration oflens parameters of camera 120 b, and a corresponding virtual focus modelof a virtual camera placed at approximately the same position andorientation as the second camera 120 b. In this example, camera 120 bhas a focal point 310 b or focal plane 320 b that is proximate tovirtual object 180. This may be done for example such that virtualobject 180 is in sharp focus, whereas virtual object 190 is in softerfocus (e.g., to de-emphasize its importance in the mind of a viewer),while physical objects 160 and 170 are strongly out of focus (e.g., tomake them appear as part of the foreground).

In this example, calibration image 220 may be placed proximate tovirtual object 180 or virtual object 190, and may assist in the defocusprocess, whether manual or automated, by providing a reference as to thefocus level at the depth where the calibration image 220 is positioned.For example, if calibration image 220 is placed within focal plane 320b, then the focus parameters and virtual focus model may be adjusteduntil calibration image 220 appears in sharp focus. Alternatively, thecalibration image may be placed proximate to virtual object 190, and thefocus parameters and virtual focus model adjusted until calibrationimage 220 is in a desirable state of soft focus, or any other desiredfocus state.

It should be noted that depending on the camera type, lens type, lenssettings or parameters, or other variables, the camera may have a focalpoint, a focal plane, a focal region that is a spherical or parabolicsection, or other focal geometry. In some cases, focal parameters of oneor more cameras may be measured or adjusted based at least in part onimages taken of one or more calibration images. In some cases, two ormore cameras may be configured to generate stereoscopic images (e.g.,for 3D video). In some cases, two identical or nearly identicalcalibration images may be placed proximate to one another at a givendepth, for calibration of a stereo imaging process. In some cases, acalibration image may include 3D features, or 2D features at multipledepths or plane orientations, to facilitate calibration of one or morecameras in three dimensions. In some cases, a calibration image may beadjusted dynamically (e.g., moved, rotated, expanded, contracted,distorted, focused, or defocused) while a camera is moved, or while oneor more focal settings of the camera are adjusted. In some cases, acalibration image may be an animation.

In some aspects, one or more calibration steps may be employed to mimicor emulate the effect of a particular camera lens (e.g., a trick lens),such that shooting of the scene may take place with a wide or infinitedepth of field, while the lens effects are reproduced on the displaywall 102 using a virtual lens model and virtual camera that mimic thedesired lens. In some cases, such lenses may behave differently at theiredges than near their centers, and the calibration process may thus needto include calibrating the edges of a viewing frustum as well as itscenter.

FIG. 4 illustrates an example real-world stage 100, which includes avideo wall 102, behind which exists a virtual space 150 comprising avirtual scene 106, in accordance with at least one implementation of thepresent disclosure. Visible are first physical object 160, secondphysical object 170, first virtual object 180, second virtual object190, first camera 120 a, first viewing frustum 130 a, and first virtualviewing frustum 140 a.

In some implementations, rather than calculating an exact depth betweena camera and a virtual object, it may be desirable to divide virtualspace 150 or virtual scene 106 into depth slices or viewing slices, andrendering all objects within a given slice at the same blur factor(e.g., the same level of blur or defocus). Such an approach may, forexample, require less computation than the procedures described above,while providing visually similar results. In the example shown in FIG. 4, a single depth slice or viewing slice 410 is shown, which containsvirtual object 180. Depending on the implementation, any number of depthslices may be used, from two (e.g., a foreground slice and a backgroundslice), to several hundred (such that, for example, different portionsof the same object may be at different focus levels).

FIG. 5 illustrates an example real-world stage 100, which includes avideo wall 102, behind which exists a virtual space 150 comprising avirtual scene 106, in accordance with at least one implementation of thepresent disclosure. Visible are first physical object 160, secondphysical object 170, first virtual object 180, second virtual object190, second camera 120 b, second viewing frustum 130 b, and secondvirtual viewing frustum 140 b.

In the example shown in FIG. 5 , two portions of virtual scene 106 havebeen identified as depth slices or viewing slices. Viewing slice 510, ata first depth D7 from the camera 120 b, contains virtual object 190,whereas viewing slice 520, at a second, further depth D8 from camera 120b, contains virtual object 180. Such non-contiguous (e.g., spaced-apart)viewing slices may be employed if, for example, there are no interveningobjects at depths larger than D7 and smaller than D8, or for otherreasons (e.g., artistic preference or limited computing power). However,in other cases depth slices or viewing slices may be contiguous. depthslices may be planar, may be spherical sections, or may be other shapesas appropriate to the lens characteristics of the camera. They may becontiguous or spaced apart, and may be same shape or thickness or may bedifferent shapes or thicknesses.

FIG. 6 illustrates an example real-world stage 100, which includes avideo wall 102, behind which exists a virtual space 150 comprising avirtual scene 106, in accordance with at least one implementation of thepresent disclosure. Visible are first physical object 160, secondphysical object 170, first virtual object 180, second virtual object190, second camera 120 b, second viewing frustum 130 b, and secondvirtual viewing frustum 140 b. In the example shown in FIG. 6 , virtualscene 106 has been divided into a plurality of subregions 600.Subregions 600 a-600 d may be depth slices or viewing slices asdescribed above, and as shown in FIG. 6 , but are not limited to suchconfigurations. Rather, virtual scene 106 may be divided into gridsquares, hexagons, irregular shapes, or tesselating or quasicrystallinepatterns of shapes, with each square, hexagon, or other subregion havinga defined depth and corresponding level of blur. In someimplementations, all virtual scene elements within a given subregion maybe assigned the same blur level or blur effect, which may yield avisually convincing result, while requiring less computation thandetermining separate defocus parameters for each individual virtualobject or scene element.

FIG. 7 illustrates an example video wall 102 divided into a plurality ofsubregions 700, in accordance with at least one implementation of thepresent disclosure. In some implementations, the virtual scene may bedivided into subregions based on display location on video wall 102itself (as shown in FIG. 7 ), rather than location within the virtualspace or virtual scene (as shown for example in FIG. 6 ). In the exampleshown in FIG. 7 , video wall 102 is divided into four subregions 700a-700 d, each of which may for example comprise pixels (e.g., an N × Npixel array or single pixel). This may be done for example if virtualobjects or scene elements (e.g., virtual objects 180 and 190) whoseimages appear near the top of the video wall 102 are located at agreater range than virtual objects or scene elements (e.g., sceneelements 710 and 720) whose images appear near the bottom of video wall102. In such a case, it may be desirable to apply one level of blur ordefocus to subregions 700 a and 700 b, and a different level of blur ordefocus to subregions 700 c and 700 d. In other cases, depending on therange or depth to each virtual object or scene element, it may bedesirable to apply a unique level of blur or defocus to each subregion700.

The arrangement shown in FIG. 7 should be considered exemplary ratherthan limiting, as other arrangements may be used instead or in addition.For example, there may be more or fewer subregions. Subregions may be ofany size or shape, and may be the same as one another or different fromone another. Subregions may be contiguous or non-contiguous.

FIG. 8 shows a virtual production set 107, in accordance with at leastone implementation of the present disclosure. The virtual production set107 includes a virtual production display or display wall 102 which cansurround a stage 100. The display wall 102 can include multiple screens102A, 102B, 102C. One or more of the screens 102A, 102B, 102C can becurved. The size of each screen 102A, 102B, 102C can correspond to asize of a wall and exceed 6 feet in height and 3 feet in width. Thestage 100 may be sufficiently large to include multiple actors 830 andprops 840, 850, 860. The stage 100 can integrate with the screens 102A,102B, 102C presenting images 815A, 815B, 815C, respectively. Forexample, the stage 100 can include props, such as rocks 850 and sand860, that mimic the appearance of rocks 870 and sand 880 that appear asimages on the display wall 102. The display wall 102 illuminates thestage 100, actors 830, and props 840, 850, 860. Thus, the lighting ofthe real and virtual environments can match the lighting of the actors830 and props 840, 850, 860. In particular, highly reflective surfaces,such as metallic surfaces, can properly reflect both the real andvirtual environments. In addition to the illumination from display wall102, additional lights 890 can illuminate the stage 100.

The display wall 102 needs to update the images 815A, 815B, 815C toreflect events on the stage 100 such as motion of the actors 830,parallax to correctly create a sense of depth, interaction between theactors 830 and the images 815A-815C, etc. In other words, the displaywall 102 may to render in real time. A rendering engine 1250, such asUnreal Engine or Gazebo, running on a processor 1300 can render theimages 815A-815C in real time in response to events on the stage 100.The rendering engine 1250 can communicate with a camera 805 using awired or a wireless network. The camera 805 can record the stage 1000including images presented on the display wall 102, actors 830, andprops 840, 850, 860. The camera 805 and the processor 1035 maycommunicate wirelessly, such that the rendering engine 825 can trackcamera movement and/or changes in camera focus.

FIG. 9 is a flowchart of an exemplary method 900 as might be performedby an image processor to defocus a sharp rendered image, in accordancewith at least one implementation of the present disclosure. Note thatone or more of the steps of method 900 may be combined, omitted, orperformed in a different order in different implementations.

In step 910, the method includes locating the position and orientationof a real-world camera on the real-world stage. Position and orientationsensors of various types may be used for this step (whether on thecamera, on the video wall, or elsewhere on the stage), or the positionand orientation may be estimated, or may be manually entered orautomatically read from, e.g., markings on the floor, etc. In someimplementations, this step also includes locating the video wall on thestage.

In step 920, the method includes placing a virtual camera within avirtual space, where the position and orientation of the virtual cameraare at least approximately the same as the position and orientation ofthe real-world camera on the real-world stage.

In step 930, the method includes placing a virtual scene element in thevirtual space. In most cases, the virtual scene element will be located“behind” the video wall and within a viewing frustum of the virtualcamera. However, in some cases, virtual scene elements (e.g., lightsources, shadow sources, reflectors, etc.) that are not visible to thevirtual camera may also be placed, if they are believed to affect thevisual appearance of virtual scene elements that are within view of thevirtual camera, or of real-world scene elements located on thereal-world stage and within the view of the real-world camera. In someimplementations, rendering of the virtual scene takes place at thisstep, along with display of the virtual scene on the video wall.

In step 940, the method includes obtaining, from the real-world camera,an indication of a lens function, wherein a lens function represents alens shape (e.g., circular, hexagonal, or a different shape) and/or alens effect (e.g., chromatic aberration, warping, fisheye, etc.). Thelens shape may correspond to an aperture shape in a physical camera lens(e.g., the lens which would produce the defocus effect the imageprocessor is emulating). Other lens parameters may also be obtained,including zoom level, depth of field, a focal point, a focal length,aberration, and/or bokeh, vignette, or lemmoning effects.

In step 950, the method includes generating a virtual focus model forthe virtual camera. In some implementations, the virtual focus model mayfor example include at least an approximation of at least some of theparameters of the lens function of step 740. The virtual focus model mayfor example account for the position of the video wall, such that thereal and virtual components of the range to the virtual scene elementcan be determined. In an example, defocus due to the real component ofthe range does not need to be simulated or otherwise accounted for, asit is physically determined by the actual focus of the physical cameralens. Thus, the adjusted focus (e.g., blur or defocus effects) may onlyneed to be added to the virtual scene element according to the virtualcomponent of the range.

In other implementations, the virtual focus model may be simpler thanthe lens(es) of the real-world camera, if only approximate adjustedfocus or defocusing effects are desired.

In step 960, the method includes determining an adjusted focus (e.g.,defocus or blurring effect) for the virtual object, based on the virtualfocus model and a range or depth (e.g., relative position or distance)between the virtual camera and the virtual scene element, or between thevideo wall and the virtual scene element. In some implementations, thisdefocus or blurring effect is applied post-rendering, e.g., applied tothe virtual scene element within a rendered image of the virtual scene.In other implementations, rendering of the virtual scene occurs at thisstep and includes the adjusted focus, defocus, or blurring effect. Ineither case, the rendered image with the adjusted focus is displayed onthe video wall, where it can serve as a smart, defocusable backdrop forthe real-world stage.

FIG. 10 is a flowchart of an exemplary method 1000 as might be performedby an image processor to defocus a sharp rendered image, in accordancewith at least one implementation of the present disclosure. Note thatone or more of the steps of method 1000 may be combined, omitted, orperformed in a different order in different implementations.

In step 1010, the method includes locating the position and orientationof a real-world camera on the real-world stage. Position and orientationsensors of various types may be used for this step (whether onboard thecamera or elsewhere on the stage), or the position and orientation maybe estimated, or may be manually entered or automatically read from,e.g., markings on the floor, etc.

In step 1020, the method includes placing a virtual camera within avirtual space, where the position and orientation of the virtual cameraare at least approximately the same as the position and orientation ofthe real-world camera on the real-world stage.

In step 1030, the method includes placing one or more virtual sceneelements in the virtual space. In most cases, the virtual scene elementswill be located “behind” the video wall and within a viewing frustum ofthe virtual camera. However, in some cases, virtual scene elements(e.g., light sources, shadow sources, reflectors, etc.) that are notvisible to the virtual camera may also be placed, if they are believedto affect the visual appearance of virtual scene elements that arewithin view of the virtual camera, or the appearance of real-world sceneelements located on the real-world stage and within the view of thereal-world camera (e.g., by casting light, reflection, or shadow). Insome implementations, rendering of the virtual scene takes place at thisstep, along with display of the virtual scene on the video wall.

In step 1040, the method includes choosing desired defocus effects. Thismay involve, for example, all objects within a certain range of depthsbeing assigned a first adjusted focus (e.g., blur or defocus level,whether calculated or artistically chosen), all objects within a secondrange of depths being assigned a second adjusted focus, etc.

In step 1050, the method includes dividing the virtual scene into depthslices or other subregions. In some cases, depth slices or othersubregions may be generated automatically, based on geometry. In othercases, depth slices or other subregions may be selected specifically toinclude certain virtual scene elements and exclude certain other virtualscene elements.

In step 1060, the method includes applying the adjusted focus, defocus,or blurring effect to the virtual objects, based on which depth slice orsubregion they are located in. In some implementations, this adjustedfocus is applied to the virtual scene elements within a rendered imageof the virtual scene. In other implementations, rendering of the virtualscene occurs at this step and includes the adjusted focus, defocus, orblurring effect. In either case, the rendered image with the adjustedfocus or blurring effects is displayed on the video wall, where it canserve as a smart, defocusable backdrop for the real-world stage.

FIG. 11 illustrates an example of defocusing a virtual object, inaccordance with at least one implementation of the present disclosure.Such blurring may for example involve creation of a blur transparencymap 1110 for a portion of a rendered image 1100. Here, a box 1106 ispositioned behind a smaller box 1104, and in front of a background 1108.In an example, box 1106 is to be defocused, in a realistic way thatincludes information about what is behind the edges of box 1106. Imageportion 1100 may not include color information for the portion of box1106 obscured by box 1104, nor of the portion of background 1108obscured by box 1106. Defocusing box 1106 may introduce unwantedtransparency (also referred to as artifacts) to pixels. The artifactsshould be corrected, while leaving other pixels with intentionaltransparency untouched. An image processor may create a blurtransparency map 1110 to protect intentionally transparent pixels ofimage portion 1100, while correcting artifacts of the defocusingprocess, as described for example in U.S. Pat. Application No.17/086,032, filed Oct. 30, 2020, hereby incorporated in its entirety asthough fully set forth herein.

The image processor may first perform edge detection and produce amodified image or blur transparency map 1110 that includes raw alphachannel output from image 1100 during the defocusing process. The darkareas of image 1110 represent pixels that include transparency, bothintentionally (e.g., as a result of softening the edges of box 1106) andas an unintended result of the defocusing process, with darker areascorresponding to a higher degree of transparency than less dark areas.For example, the dark area outside of region 1106 may be the intentionalresult of the defocusing process and should not be corrected. The darkarea around region 1104, however, may be an unintended artifactintroduced during the defocusing process that should be corrected. Forexample, while defocusing box 1106, unintentional transparency may havebeen introduced around box 1104 because of missing color informationcaused by box 1104 obscuring box 1106. The image processor may create amask 1114 (represented here as a region filled with diagonal lines) todesignate areas of the image portion 1100 that may contain artifactsthat should be corrected, resulting in blur transparency map 1110. Whenthe image processor blends color values for pixels, it will excludepixels outside the mask 1114. The result of defocusing box 1106 isillustrated in a selectively defocused image 1120. As illustrated inselectively defocused image 1120, the transparent area around box 1104introduced as part of defocusing box 306 has been removed, but thetransparent area along the outside of box 1106 (giving box 1106 asoftened appearance) has been preserved.

FIG. 12 illustrates an example visual content generation system 1200 asmight be used to generate imagery in the form of still images and/orvideo sequences of images, in accordance with at least oneimplementation of the present disclosure. Visual content generationsystem 1200 might generate imagery of live action scenes, computergenerated scenes, or a combination thereof. In a practical system, usersare provided with tools that allow them to specify, at high levels andlow levels where necessary, what is to go into that imagery. Forexample, a user might be an animation artist and might use visualcontent generation system 1200 to capture interaction between two humanactors performing live on a sound stage and replace one of the humanactors with a computer-generated anthropomorphic non-human being thatbehaves in ways that mimic the replaced human actor’s movements andmannerisms, and then add in a third computer-generated character andbackground scene elements that are computer-generated, all in order totell a desired story or generate desired imagery.

Still images that are output by visual content generation system 1200might be represented in computer memory as pixel arrays, such as atwo-dimensional array of pixel color values, each associated with apixel having a position in a two-dimensional image array. Pixel colorvalues might be represented by three or more (or fewer) color values perpixel, such as a red value, a green value, and a blue value (e.g., inRGB format). Dimensions of such a two-dimensional array of pixel colorvalues might correspond to a preferred and/or standard display scheme,such as 1220-pixel columns by 1280-pixel rows or 4096-pixel columns by2160-pixel rows, or some other resolution. Images might or might not bestored in a compressed format, but either way, a desired image may berepresented as a two-dimensional array of pixel color values. In anothervariation, images are represented by a pair of stereo images forthree-dimensional presentations and in other variations, an imageoutput, or a portion thereof, might represent three-dimensional imageryinstead of just two-dimensional views. In yet other implementations,pixel values are data structures and a pixel value is associated with apixel and can be a scalar value, a vector, or another data structureassociated with a corresponding pixel. That pixel value might includecolor values, or not, and might include depth values, alpha values,weight values, object identifiers or other pixel value components.

A stored video sequence might include a plurality of images such as thestill images described above, but where each image of the plurality ofimages has a place in a timing sequence and the stored video sequence isarranged so that when each image is displayed in order, at a timeindicated by the timing sequence, the display presents what appears tobe moving and/or changing imagery. In one representation, each image ofthe plurality of images is a video frame having a specified frame numberthat corresponds to an amount of time that would elapse from when avideo sequence begins playing until that specified frame is displayed. Aframe rate might be used to describe how many frames of the stored videosequence are displayed per unit time. Example video sequences mightinclude 24 frames per second (24 FPS), 50 FPS, 140 FPS, or other framerates. In some implementations, frames are interlaced or otherwisepresented for display, but for clarity of description, in some examples,it is assumed that a video frame has one specified display time, butother variations might be contemplated.

One method of creating a video sequence is to simply use a video camerato record a live action scene, i.e., events that physically occur andcan be recorded by a video camera. The events being recorded can beevents to be interpreted as viewed (such as seeing two human actors talkto each other) and/or can include events to be interpreted differentlydue to clever camera operations (such as moving actors about a stage tomake one appear larger than the other despite the actors actually beingof similar build, or using miniature objects with other miniatureobjects so as to be interpreted as a scene containing life-sizedobjects).

Creating video sequences for story-telling or other purposes often callsfor scenes that cannot be created with live actors, such as a talkingtree, an anthropomorphic object, space battles, and the like. Such videosequences might be generated computationally rather than capturing lightfrom live scenes. In some instances, an entirety of a video sequencemight be generated computationally, as in the case of acomputer-animated feature film. In some video sequences, it is desirableto have some computer-generated imagery and some live action, perhapswith some careful merging of the two.

While computer-generated imagery might be creatable by manuallyspecifying each color value for each pixel in each frame, this is likelytoo tedious to be practical. As a result, a creator uses various toolsto specify the imagery at a higher level. As an example, an artist mightspecify the positions in a scene space, such as a three-dimensionalcoordinate system, of objects and/or lighting, as well as a cameraviewpoint, and a camera view plane. From that, a rendering engine couldtake all of those as inputs, and compute each of the pixel color valuesin each of the frames. In another example, an artist specifies positionand movement of an articulated object having some specified texturerather than specifying the color of each pixel representing thatarticulated object in each frame.

In a specific example, a rendering engine performs ray tracing wherein apixel color value is determined by computing which objects lie along aray traced in the scene space from the camera viewpoint through a pointor portion of the camera view plane that corresponds to that pixel. Forexample, a camera view plane might be represented as a rectangle havinga position in the scene space that is divided into a grid correspondingto the pixels of the ultimate image to be generated, and if a raydefined by the camera viewpoint in the scene space and a given pixel inthat grid first intersects a solid, opaque, blue object, that givenpixel is assigned the color blue. Of course, for moderncomputer-generated imagery, determining pixel colors — and therebygenerating imagery — can be more complicated, as there are lightingissues, reflections, interpolations, and other considerations.

As illustrated in FIG. 12 , a live action capture system 1202 captures alive scene that plays out on a stage 1204. Live action capture system1202 is described herein in greater detail, but might include computerprocessing capabilities, image processing capabilities, one or moreprocessors, program code storage for storing program instructionsexecutable by the one or more processors, as well as user input devicesand user output devices, not all of which are shown.

In a specific live action capture system, cameras 1206(1) and 1206(2)capture the scene, while in some systems, there might be other sensor(s)1208 that capture information from the live scene (e.g., infraredcameras, infrared sensors, motion capture (“mo-cap”) detectors, etc.).On stage 1204, there might be human actors, animal actors, inanimateobjects, background objects, and possibly an object such as a greenscreen 1210 that is designed to be captured in a live scene recording insuch a way that it is easily overlaid with computer-generated imagery.Stage 1204 might also contain objects that serve as fiducials, such asfiducials 1212(1)-(3), that might be used post-capture to determinewhere an object was during capture. A live action scene might beilluminated by one or more lights, such as an overhead light 1214.

During or following the capture of a live action scene, live actioncapture system 1202 might output live action footage to a live actionfootage storage 1220. A live action processing system 1222 might processlive action footage to generate data about that live action footage andstore that data into a live action metadata storage 1224. Live actionprocessing system 1222 might include computer processing capabilities,image processing capabilities, one or more processors, program codestorage for storing program instructions executable by the one or moreprocessors, as well as user input devices and user output devices, notall of which are shown. Live action processing system 1222 might processlive action footage to determine boundaries of objects in a frame ormultiple frames, determine locations of objects in a live action scene,where a camera was relative to some action, distances between movingobjects and fiducials, etc. Where elements have sensors attached to themor are detected, the metadata might include location, color, andintensity of overhead light 1214, as that might be useful inpost-processing to match computer-generated lighting on objects that arecomputer-generated and overlaid on the live action footage. Live actionprocessing system 1222 might operate autonomously, perhaps based onpredetermined program instructions, to generate and output the liveaction metadata upon receiving and inputting the live action footage.The live action footage can be camera-captured data as well as data fromother sensors.

An animation creation system 1230 is another part of visual contentgeneration system 1200. Animation creation system 1230 might includecomputer processing capabilities, image processing capabilities, one ormore processors, program code storage for storing program instructionsexecutable by the one or more processors, as well as user input devicesand user output devices, not all of which are shown. Animation creationsystem 1230 might be used by animation artists, managers, and others tospecify details, perhaps programmatically and/or interactively, ofimagery to be generated. From user input and data from a database orother data source, indicated as a data store 1232, animation creationsystem 1230 might generate and output data representing objects (e.g., ahorse, a human, a ball, a teapot, a cloud, a light source, a texture,etc.) to an object storage 1234, generate and output data representing ascene into a scene description storage 1236, and/or generate and outputdata representing animation sequences to an animation sequence storage1238.

Scene data might indicate locations of objects and other visualelements, values of their parameters, lighting, camera location, cameraview plane, and other details that a rendering engine 1250 might use torender CGI imagery. For example, scene data might include the locationsof several articulated characters, background objects, lighting, etc.specified in a two-dimensional space, three-dimensional space, or otherdimensional space (such as a 2.5-dimensional space, three-quarterdimensions, pseudo-3D spaces, etc.) along with locations of a cameraviewpoint and view place from which to render imagery. For example,scene data might indicate that there is to be a red, fuzzy, talking dogin the right half of a video and a stationary tree in the left half ofthe video, all illuminated by a bright point light source that is aboveand behind the camera viewpoint. In some cases, the camera viewpoint isnot explicit, but can be determined from a viewing frustum. In the caseof imagery that is to be rendered to a rectangular view, the frustumwould be a truncated pyramid. Other shapes for a rendered view arepossible and the camera view plane could be different for differentshapes.

Animation creation system 1230 might be interactive, allowing a user toread in animation sequences, scene descriptions, object details, etc.and edit those, possibly returning them to storage to update or replaceexisting data. As an example, an operator might read in objects fromobject storage into a baking processor 1242 that would transform thoseobjects into simpler forms and return those to object storage 1234 asnew or different objects. For example, an operator might read in anobject that has dozens of specified parameters (movable joints, coloroptions, textures, etc.), select some values for those parameters andthen save a baked object that is a simplified object with now fixedvalues for those parameters.

Rather than requiring user specification of each detail of a scene, datafrom data store 1232 might be used to drive object presentation. Forexample, if an artist is creating an animation of a spaceship passingover the surface of the Earth, instead of manually drawing or specifyinga coastline, the artist might specify that animation creation system1230 is to read data from data store 1232 in a file containingcoordinates of Earth coastlines and generate background elements of ascene using that coastline data.

Animation sequence data might be in the form of time series of data forcontrol points of an object that has attributes that are controllable.For example, an object might be a humanoid character with limbs andjoints that are movable in manners similar to typical human movements.An artist can specify an animation sequence at a high level, such as“the left hand moves from location (X1, Y1, Z1) to (X2, Y2, Z2) overtime T1 to T2”, at a lower level (e.g., “move the elbow joint 2.5degrees per frame”) or even at a very high level (e.g., “character Ashould move, consistent with the laws of physics that are given for thisscene, from point P1 to point P2 along a specified path”).

Animation sequences in an animated scene might be specified by whathappens in a live action scene. An animation driver generator 1244 mightread in live action metadata, such as data representing movements andpositions of body parts of a live actor during a live action scene.Animation driver generator 1244 might generate corresponding animationparameters to be stored in animation sequence storage 1238 for use inanimating a CGI object. This can be useful where a live action scene ofa human actor is captured while wearing mo-cap fiducials (e.g.,high-contrast markers outside actor clothing, high-visibility paint onactor skin, face, etc.) and the movement of those fiducials isdetermined by live action processing system 1222. Animation drivergenerator 1244 might convert that movement data into specifications ofhow joints of an articulated CGI character are to move over time.

A rendering engine 1250 can read in animation sequences, scenedescriptions, and object details, as well as rendering engine controlinputs, such as a resolution selection and a set of renderingparameters. Resolution selection might be useful for an operator tocontrol a trade-off between speed of rendering and clarity of detail, asspeed might be more important than clarity for a movie maker to testsome interaction or direction, while clarity might be more importantthan speed for a movie maker to generate data that will be used forfinal prints of feature films to be distributed. Rendering engine 1250might include computer processing capabilities, image processingcapabilities, one or more processors, program code storage for storingprogram instructions executable by the one or more processors, as wellas user input devices and user output devices, not all of which areshown.

Visual content generation system 1200 can also include a merging system1260 that merges live footage with animated content. The live footagemight be obtained and input by reading from live action footage storage1220 to obtain live action footage, by reading from live action metadatastorage 1224 to obtain details such as presumed segmentation in capturedimages segmenting objects in a live action scene from their background(perhaps aided by the fact that green screen 1210 was part of the liveaction scene), and by obtaining CGI imagery from rendering engine 1250.

A merging system 1260 might also read data from rulesets formerging/combining storage 1262. A very simple example of a rule in aruleset might be “obtain a full image including a two-dimensional pixelarray from live footage, obtain a full image including a two-dimensionalpixel array from rendering engine 1250, and output an image where eachpixel is a corresponding pixel from rendering engine 1250 when thecorresponding pixel in the live footage is a specific color of green,otherwise output a pixel value from the corresponding pixel in the livefootage.”

Merging system 1260 might include computer processing capabilities,image processing capabilities, one or more processors, program codestorage for storing program instructions executable by the one or moreprocessors, as well as user input devices and user output devices, notall of which are shown. Merging system 1260 might operate autonomously,following programming instructions, or might have a user interface orprogrammatic interface over which an operator can control a mergingprocess. In some implementations, an operator can specify parametervalues to use in a merging process and/or might specify specific tweaksto be made to an output of merging system 1260, such as modifyingboundaries of segmented objects, inserting blurs to smooth outimperfections, or adding other effects. Based on its inputs, mergingsystem 1260 can output an image to be stored in a static image storage1270 and/or a sequence of images in the form of video to be stored in ananimated/combined video storage 1272.

Thus, as described, visual content generation system 1200 can be used togenerate video that combines live action with computer-generatedanimation using various components and tools, some of which aredescribed in more detail herein. While visual content generation system1200 might be useful for such combinations, with suitable settings, itcan be used for outputting entirely live action footage or entirely CGIsequences. The code may also be provided and/or carried by a transitorycomputer readable medium, e.g., a transmission medium such as in theform of a signal transmitted over a network.

According to one implementation, the techniques described herein areimplemented by one or more generalized computing systems programmed toperform the techniques pursuant to program instructions in firmware,memory, other storage, or a combination. Special-purpose computingdevices may be used, such as desktop computer systems, portable computersystems, handheld devices, networking devices or any other device thatincorporates hard-wired and/or program logic to implement thetechniques.

For example, FIG. 13 is a block diagram that illustrates a computersystem 1300 upon which the computer systems of the systems describedherein and/or visual content generation system 1200 (see FIG. 12 ) maybe implemented, in accordance with at least one implementation of thepresent disclosure. Computer system 1300 includes a bus 1302 or othercommunication mechanism for communicating information, and a processor1304 coupled with bus 1302 for processing information. Processor 1304may be, for example, a general-purpose microprocessor.

Computer system 1300 also includes a main memory 1306, such as arandom-access memory (RAM) or other dynamic storage device, coupled tobus 1302 for storing information and instructions to be executed byprocessor 1304. Main memory 1306 may also be used for storing temporaryvariables or other intermediate information during execution ofinstructions to be executed by processor 1304. Such instructions, whenstored in non-transitory storage media accessible to processor 1304,render computer system 1300 into a special-purpose machine that iscustomized to perform the operations specified in the instructions.

Computer system 1300 further includes a read only memory (ROM) 1308 orother static storage device coupled to bus 1302 for storing staticinformation and instructions for processor 1304. A storage device 1310,such as a magnetic disk or optical disk, is provided and coupled to bus1302 for storing information and instructions.

Computer system 1300 may be coupled via bus 1302 to a display 1312, suchas a computer monitor, for displaying information to a computer user. Aninput device 1314, including alphanumeric and other keys, is coupled tobus 1302 for communicating information and command selections toprocessor 1304. Another type of user input device is a cursor control1316, such as a mouse, a trackball, or cursor direction keys forcommunicating direction information and command selections to processor1304 and for controlling cursor movement on display 1312. This inputdevice typically has two degrees of freedom in two axes, a first axis(e.g., x) and a second axis (e.g., y), that allows the device to specifypositions in a plane.

Computer system 1300 may implement the techniques described herein usingcustomized hard-wired logic, one or more ASICs or FPGAs, firmware and/orprogram logic which in combination with the computer system causes orprograms computer system 1300 to be a special-purpose machine. Accordingto one implementation, the techniques herein are performed by computersystem 1300 in response to processor 1304 executing one or moresequences of one or more instructions contained in main memory 1306.Such instructions may be read into main memory 1306 from another storagemedium, such as storage device 1310. Execution of the sequences ofinstructions contained in main memory 1306 causes processor 1304 toperform the process steps described herein. In alternativeimplementations, hard-wired circuitry may be used in place of or incombination with software instructions.

The term “storage media” as used herein refers to any non-transitorymedia that store data and/or instructions that cause a machine tooperation in a specific fashion. Such storage media may includenon-volatile media and/or volatile media. Non-volatile media includes,for example, optical or magnetic disks, such as storage device 1310.Volatile media includes dynamic memory, such as main memory 1306. Commonforms of storage media include, for example, a floppy disk, a flexibledisk, hard disk, solid state drive, magnetic tape, or any other magneticdata storage medium, a CD-ROM, any other optical data storage medium,any physical medium with patterns of holes, a RAM, a PROM, an EPROM, aFLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media is distinct from but may be used in conjunction withtransmission media. Transmission media participates in transferringinformation between storage media. For example, transmission mediaincludes coaxial cables, copper wire, and fiber optics, including thewires that include bus 1302. Transmission media can also take the formof acoustic or light waves, such as those generated during radio-waveand infra-red data communications.

Various forms of media may be involved in carrying one or more sequencesof one or more instructions to processor 1304 for execution. Forexample, the instructions may initially be carried on a magnetic disk orsolid-state drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over anetwork connection. A modem or network interface local to computersystem 1300 can receive the data. Bus 1302 carries the data to mainmemory 1306, from which processor 1304 retrieves and executes theinstructions. The instructions received by main memory 1306 mayoptionally be stored on storage device 1310 either before or afterexecution by processor 1304.

Computer system 1300 also includes a communication interface 1318coupled to bus 1302. Communication interface 1318 provides a two-waydata communication coupling to a network link 1320 that is connected toa local network 1322. For example, communication interface 1318 may be anetwork card, a modem, a cable modem, or a satellite modem to provide adata communication connection to a corresponding type of telephone lineor communications line. Wireless links may also be implemented. In anysuch implementation, communication interface 1318 sends and receiveselectrical, electromagnetic, or optical signals that carry digital datastreams representing various types of information.

Network link 1320 typically provides data communication through one ormore networks to other data devices. For example, network link 1320 mayprovide a connection through local network 1322 to a host computer 1324or to data equipment operated by an Internet Service Provider (ISP)1326. ISP 1326 in turn provides data communication services through theworld-wide packet data communication network now commonly referred to asthe “Internet” 1328. Local network 1322 and Internet 1328 both useelectrical, electromagnetic, or optical signals that carry digital datastreams. The signals through the various networks and the signals onnetwork link 1320 and through communication interface 1318, which carrythe digital data to and from computer system 1300, are example forms oftransmission media.

Computer system 1300 can send messages and receive data, includingprogram code, through the network(s), network link 1320, andcommunication interface 1318. In the Internet example, a server 1330might transmit a requested code for an application program through theInternet 1328, ISP 1326, local network 1322, and communication interface1318. The received code may be executed by processor 1304 as it isreceived, and/or stored in storage device 1310, or other non-volatilestorage for later execution.

Operations of processes described herein can be performed in anysuitable order unless otherwise indicated herein or otherwise clearlycontradicted by context. Processes described herein (or variationsand/or combinations thereof) may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs or one or more applications) executing collectively onone or more processors, by hardware or combinations thereof. The codemay be stored on a computer-readable storage medium, for example, in theform of a computer program comprising a plurality of instructionsexecutable by one or more processors. The computer-readable storagemedium may be non-transitory. The code may also be provided carried by atransitory computer readable medium e.g., a transmission medium such asin the form of a signal transmitted over a network.

Conjunctive language, such as phrases of the form “at least one of A, B,and C,” or “at least one of A, B and C,” unless specifically statedotherwise or otherwise clearly contradicted by context, is otherwiseunderstood with the context as used in general to present that an item,term, etc., may be either A or B or C, or any nonempty subset of the setof A and B and C. For instance, in the illustrative example of a sethaving three members, the conjunctive phrases “at least one of A, B, andC” and “at least one of A, B and C” refer to any of the following sets:{A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C}. Thus, such conjunctivelanguage is not generally intended to imply that certain implementationsrequire at least one of A, at least one of B and at least one of C eachto be present.

The use of examples, or exemplary language (e.g., “such as”) providedherein, is intended merely to better illuminate implementations of theinvention and does not pose a limitation on the scope of the inventionunless otherwise claimed. No language in the specification should beconstrued as indicating any non-claimed element as essential to thepractice of the invention.

In the foregoing specification, implementations of the invention havebeen described with reference to numerous specific details that may varyfrom implementation to implementation. The specification and drawingsare, accordingly, to be regarded in an illustrative rather than arestrictive sense. The sole and exclusive indicator of the scope of theinvention, and what is intended by the applicants to be the scope of theinvention, is the literal and equivalent scope of the set of claims thatissue from this application, in the specific form in which such claimsissue, including any subsequent correction.

Further implementations can be envisioned to one of ordinary skill inthe art after reading this disclosure. In other implementations,combinations or sub-combinations of the above-disclosed invention can beadvantageously made. The example arrangements of components are shownfor purposes of illustration and combinations, additions,rearrangements, and the like are contemplated in alternativeimplementations of the present invention. Thus, while the invention hasbeen described with respect to exemplary implementations, one skilled inthe art will recognize that numerous modifications are possible.

For example, the processes described herein may be implemented usinghardware components, software components, and/or any combinationthereof. The specification and drawings are, accordingly, to be regardedin an illustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims and that the invention is intended to cover allmodifications and equivalents within the scope of the following claims.

All references, including publications, patent applications, andpatents, cited herein are hereby incorporated by reference to the sameextent as if each reference were individually and specifically indicatedto be incorporated by reference and were set forth in its entiretyherein.

1. A computer-implemented method of generating a virtual scene renderingusable in a captured scene, the method comprising: determining a cameraposition of a camera in a stage environment; determining a displayposition of a virtual scene display in the stage environment;determining a virtual focus model, based at least on a relativepositioning as between the camera position and the display position,wherein the virtual focus model is represented by a focus model datastructure defining how focus should be applied to virtual scene elementsin the virtual scene to be presented on the virtual scene display whilethe camera captures imagery of the stage environment including thevirtual scene display; determining a depth value for a given virtualscene element, wherein the depth value corresponds to a virtual distancein the virtual scene between the given virtual scene element and avirtual camera viewpoint and/or the virtual scene display; determiningan adjusted focus, in the virtual scene, of the given virtual sceneelement based on at least the depth value and the relative positioning;and rendering the virtual scene taking into account the adjusted focus.2. The computer-implemented method of claim 1, wherein the virtual focusmodel comprises data specifying at least one of a depth of field for thecamera, a focal point for the camera, and/or a focal length of a lens ofthe camera.
 3. The computer-implemented method of claim 1, wherein thevirtual focus model comprises data specifying a set of bokeh, vignette,or lemmoning effects for one or more virtual scene elements to beapplied to the virtual scene elements when rendering the virtual scene.4. The computer-implemented method of claim 1, wherein the capturedscene comprises an optical view of the stage environment, wherein thestage environment is a movie set, and wherein the virtual scene displayis positioned in the stage environment further from the camera than atleast one live actor visible in a camera scene captured by the camera.5. The computer-implemented method of claim 1, wherein the adjustedfocus comprises a defocus of the given virtual scene element to, atleast approximately, match a presumed defocus the given virtual sceneelement would have if the given virtual scene element were present at apresumed distance from the camera that corresponds to a function of afirst distance from the camera to the virtual scene display and thedepth value of the given virtual scene element.
 6. Thecomputer-implemented method of claim 5, wherein the function of thefirst distance and the depth value is a sum of the first distance andthe depth value.
 7. The computer-implemented method of claim 1, whereindetermining the camera position comprises reading data from cameraposition sensors placed on the camera.
 8. The computer-implementedmethod of claim 1, wherein determining the display position comprisesreading data from display position sensors placed on the virtual scenedisplay.
 9. The computer-implemented method of claim 1, whereindetermining the camera position and determining the display positioncomprise receiving manually entered position data.
 10. Thecomputer-implemented method of claim 1, wherein determining the virtualfocus model is based, at least in part, on predetermined parametersdefining lens characteristics of a lens used on the camera.
 11. Thecomputer-implemented method of claim 1, wherein the virtual scenedisplay comprises an LED wall.
 12. The computer-implemented method ofclaim 11, wherein the LED wall is positioned as a background in thestage environment and spans an entirety of a scene captured by thecamera.
 13. The computer-implemented method of claim 11, wherein the LEDwall is planar, piecewise planar, or wherein at least a portion of theLED wall has a curved portion.
 14. The computer-implemented method ofclaim 1, wherein the virtual scene display is planar and perpendicularto a camera lens axis.
 15. A non-transitory computer-readable storagemedium storing instructions, which when executed by at least oneprocessor of a computer system, causes the computer system to carry outthe method of claim
 1. 16. (canceled)
 17. A computer system comprising:one or more processors; and a storage medium storing instructions, whichwhen executed by the one or more processors, cause the computer systemto implement the method of claim
 1. 18. (canceled)